PostgreSQL Page页结构解析 B-Tree索引行的存储结构

本文简单介绍了在PG数据库B-Tree索引的物理存储内容。

环境准备

创建数据表,插入数据并创建索引。

   
 -- 创建一张表,插入几行数据
drop table if exists t_index;
create table t_index (id int,c1 char(8),c2 varchar(16));
insert into t_index values(2,'1','a'
insert into t_index values(4,'2','b');
insert into t_index values(8,'3','c');
insert into t_index values(16,'4','d');

 -- 创建索引
 alter table t_index add constraint pk_t_index primary key(id);
 SELECT pg_relation_filepath('pk_t_index');

索引文件raw data

[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441
00000000  00 00 00 00 08 9f 79 01  00 00 00 00 48 00 f0 1f  |......y.....H...|
00000010  f0 1f 04 20 00 00 00 00  62 31 05 00 04 00 00 00  |... ....b1......|
00000020  01 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 f0 bf  |................|
00000040  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001ff0  00 00 00 00 00 00 00 00  00 00 00 00 08 00 00 00  |................|
00002000  00 00 00 00 78 9e 79 01  00 00 00 00 28 00 b0 1f  |....x.y.....(...|
00002010  f0 1f 04 20 00 00 00 00  e0 9f 20 00 d0 9f 20 00  |... ...... ... .|
00002020  c0 9f 20 00 b0 9f 20 00  b0 9f 20 00 00 00 00 00  |.. ... ... .....|
00002030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003fb0  00 00 00 00 04 00 10 00  10 00 00 00 00 00 00 00  |................|
00003fc0  00 00 00 00 03 00 10 00  08 00 00 00 00 00 00 00  |................|
00003fd0  00 00 00 00 02 00 10 00  04 00 00 00 00 00 00 00  |................|
00003fe0  00 00 00 00 01 00 10 00  02 00 00 00 00 00 00 00  |................|
00003ff0  00 00 00 00 00 00 00 00  00 00 00 00 03 00 00 00  |................|
00004000

B-Tree索引物理存储结构

我们可以通过pageinspect插件查看索引的PostgreSQL page页结构解析 解析方法#pageinspect插件

Page 0是索引元数据页:

select * from page_header(get_raw_page('pk_t_index',0)); 
select * from bt_metap('pk_t_index'); 

root=1提示root页在第1页,通过page_header查看页头数据:

testdb=# select * from page_header(get_raw_page('pk_t_index',1));

每个索引entries结构为IndexTupleData+Bitmap+Value,其中IndexTupleData占8个字节,Bitmap占4个字节,Value占4字节,合计占用16个字节,数据结构如下:
index tuple 组成

组成部分 字节
InexTupleData 8 bytes
Bitmap 4 bytes
Value 4 bytes

InexTupleData 组成

组成部分 字节
t_tid 6 bytes
t_info 2 bytes
t_tid组成
组成部分 字节
ip_blkid 4 bytes
t_info 2 bytes
--------------------------------------itup.h-----------------------------------------
/*

* Index tuple header structure

*

* All index tuples start with IndexTupleData. If the HasNulls bit is set,

* this is followed by an IndexAttributeBitMapData. The index attribute

* values follow, beginning at a MAXALIGN boundary.

*

* Note that the space allocated for the bitmap does not vary with the number

* of attributes; that is because we don't have room to store the number of

 * attributes in the header. Given the MAXALIGN constraint there's no space

 * savings to be had anyway, for usual values of INDEX_MAX_KEYS.

typedef struct IndexTupleData
{
	ItemPointerData t_tid;		/* reference TID to heap tuple */

	/* ---------------
	 * t_info is laid out in the following fashion:
	 *
	 * 15th (high) bit: has nulls
	 * 14th bit: has var-width attributes
	 * 13th bit: AM-defined meaning
	 * 12-0 bit: size of tuple
	 * ---------------
	 */

	unsigned short t_info;		/* various info about tuple */

} IndexTupleData;				/* MORE DATA FOLLOWS AT END OF STRUCT */

 typedef IndexTupleData *IndexTuple;

typedef struct IndexAttributeBitMapData
{
	bits8		bits[(INDEX_MAX_KEYS + 8 - 1) / 8];
}			IndexAttributeBitMapData;

typedef IndexAttributeBitMapData * IndexAttributeBitMap;

// 说明: (32 + 8 - 1 /8)=4 ,表示4 个8位二进制,也就是4 bytes.
--------------------------------itemptr.h----------------------------------------
typedef struct ItemPointerData
{
	BlockIdData ip_blkid;
	OffsetNumber ip_posid;
}

/* If compiler understands packed and aligned pragmas, use those */
#if defined(pg_attribute_packed) && defined(pg_attribute_aligned)
			pg_attribute_packed()
			pg_attribute_aligned(2)
#endif
ItemPointerData;
----------------------------------block.h----------------------------------------------
typedef struct BlockIdData
{
	uint16		bi_hi;
	uint16		bi_lo;
} BlockIdData;
----------------------------------off.h------------------------------------------------
typedef uint16 OffsetNumber;

通过bt_page_items函数查看索引entries:

testdb=# select * from bt_page_items('pk_t_index',1); 
 itemoffset | ctid  | itemlen | nulls | vars |          data           | dead | htid  | tids 
------------+-------+---------+-------+------+-------------------------+------+-------+------
          1 | (0,1) |      16 | f     | f    | 02 00 00 00 00 00 00 00 | f    | (0,1) | 
          2 | (0,2) |      16 | f     | f    | 04 00 00 00 00 00 00 00 | f    | (0,2) | 
          3 | (0,3) |      16 | f     | f    | 08 00 00 00 00 00 00 00 | f    | (0,3) | 
          4 | (0,4) |      16 | f     | f    | 10 00 00 00 00 00 00 00 | f    | (0,4) |

相应的物理索引文件内容:

[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441
00000000  00 00 00 00 08 9f 79 01  00 00 00 00 48 00 f0 1f  |......y.....H...|
00000010  f0 1f 04 20 00 00 00 00  62 31 05 00 04 00 00 00  |... ....b1......|
00000020  01 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 f0 bf  |................|
00000040  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001ff0  00 00 00 00 00 00 00 00  00 00 00 00 08 00 00 00  |................|
00002000  00 00 00 00 78 9e 79 01  00 00 00 00 28 00 b0 1f  |....x.y.....(...|
00002010  f0 1f 04 20 00 00 00 00  e0 9f 20 00 d0 9f 20 00  |... ...... ... .|
00002020  c0 9f 20 00 b0 9f 20 00  b0 9f 20 00 00 00 00 00  |.. ... ... .....|
00002030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003fb0  00 00 00 00 04 00 10 00  10 00 00 00 00 00 00 00  |................|
00003fc0  00 00 00 00 03 00 10 00  08 00 00 00 00 00 00 00  |................|
00003fd0  00 00 00 00 02 00 10 00  04 00 00 00 00 00 00 00  |................|
00003fe0  00 00 00 00 01 00 10 00  02 00 00 00 00 00 00 00  |................|
00003ff0  00 00 00 00 00 00 00 00  00 00 00 00 03 00 00 00  |................|
00004000

查找index tuple 开始位置

echo $((0x00003fb0))
16304

ItemPointerData

 [xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441 -s 16304 -n 6
 00003fb0  00 00 00 00 04 00                                 |......|
00003fb6

-- blockid=\x0000,offset=\x0004

t_info

[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441 -s 16310 -n 2
00003fb6  10 00                                             |..|
00003fb8
t_info=\x0010,即16,表示tuple(索引项)大小为16个字

values

select * from t_index ;
 id |    c1    | c2 
----+----------+----
  2 | 1        | a
  4 | 2        | b
  8 | 3        | c
 16 | 4        | d
(4 rows)
[postgres@node1 ~]$ hexdump -C $PGDATA/base/13758/16441 -s 16312 -n 4
00003fb8  10 00 00 00                                       |....|
00003fbc
values=\x0010 ,即16 ,表示(0,4)的ID 值为16 . 

小结

page
16384(bytes)
page...
page
16384(bytes)
page...
page
16384(bytes)
page...
block number
block number
0th
0th
1th
1th
N-th
N-th
heap tuples
(record data)
heap tuples...
index file
index file
pd_lsn
pd_lsn
pd_checmsum
pd_checmsum
pd_flags
pd_flags
pd_lower
pd_lower
pd_upper
pd_upper
pd_special
pd_special
pd_pagesize_
version
pd_pagesize_...
pg_prune_xid
pg_prune_xid
1
1
2
2
tuple 1
tuple 1
line pointers
line pointers
free space
(hole)
free space...
pd_lower
pd_lower
pd_upper
pd_upper
lp_len
lp_len
lp_flags
lp_flags
lp_off
lp_off
2bits
2bits
15 bits
15 bits
15bits
15bits
t_tid
t_tid
t_info
t_info
ip_blkid
ip_blkid
ip_posid
ip_posid
4bytes
4bytes
2bytes
2bytes
2bytes
2bytes
8bytes
8bytes
values
values
4bytes
4bytes
IndexTupleData
IndexTupleData
Bitmap
Bitmap
4bytes
4bytes
8bytes
8bytes
16bytes
16bytes
Special Space
Special Space
2bytes
2bytes
btpo_prev
btpo_prev
btpo_next
btpo_next
btpo_level
btpo_level
btpo_flags
btpo_flags
btpo_cycleid
btpo_cycle...
4bytes
4bytes
4bytes
4bytes
4bytes
4bytes
2bytes
2bytes
2bytes
2bytes
16bytes
16bytes
24bytes
24bytes
8bytes
8bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
2bytes
4bytes
4bytes
0
0
Viewer does not support full SVG 1.1

小结一下,主要有以下几点:
1、数据存储:索引数据页头和与普通数据表页头一样的结构,占用24个字节,ItemIds占用4个字节;
2、索引entries:结构为IndexTupleData+Bitmap+Value;
3、内容查看:可通过pageinspect插件,推荐通过hexdump物理文件查看,有助于理解数据结构和数据的底层存储格式.
4、 根据t_info 里面算出values 的长度。